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Abstract 

Mateescu, Salomaa, and Yu asked: is it decidable whether a given 
subword history assumes only non-negative values for all words over a 
given alphabet. In this paper, we solve this open problem by proving that 
this problem is undecidable even under stronger conditions than supposed 
originally. 

1 Subword history and inequality 

Let S be an alphabet, and by E* we denote the set of all words over E including 
the empty word A. 

Parikh mappings (vectors), introduced in |10j . provide us with numerical 
properties of a word and a set of words. Some specific ordering of the letters in 
S = {oi, . . . , an} in mind, the Parikh mapping of a word w is (Iwlai , \w\a2 , • • ■ , 
where \w\a denotes the number of occurrences of a letter a S E in a word 
w € T,* (for instance, |aa6|a = 2 and |aa6|b = 1). This idea can be generalized 
as counting in w the number of occurrences of another word u as a (contin- 
uous) subword or a scattered subword. The latter is of especial interest. In 
general, u is a scattered subword of w if there exist an integer fc > 1 and words 
xi, . . . ,Xk,yo,yi, ■ ■ ■ ,yk, some of which are possibly empty, such that 

u^xi---Xk and w = yoXiyi ■ ■ ■ xtyk- 

For various usages of terminologies, the reader is referred to [12]. Then we can 
generalize the notation \w\a as \w\u to denote the number of occurrences of u 
as a scattered subword of w. For instance, |aa6|a6 = 2 because two occurrences 
of a precede that of b. It is a convention made in [7] to assume that \'w\\ — 1 
for the empty word A and any word w G E*. 

The number of scattered subwords can provide more information about the 
word w itself than Parikh mapping. For E = {a, 6}, the Parikh mapping (3, 3) 
admits all 20 words in aaa^jjjbb like ababba as w, where uj is the shuffle operation. 
Adding a condition \w\ab — 8 to this Parikh mapping reduces the candidate of 
w to aababb 7 . More advanced logic can be implemented by adding and/or 
multiplying such conditions; \w\a x |?ii|f, = 4 implies that w € aujbbbbU aaujbbU 
aaaouJ)- This idea led Mateescu, Salomaa, and Yu to propose the notion of 
subword history as follows. 
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Definition 1 (fT). A subword history in S and its value in a word w are defined 
recursively as follows: 

• Every word u in S* is a subword history in E, referred to as monomial, 
and its value in w equals \w\u- 

• Assume that SHi and SH2 are subword histories with values ai and a2, 
respectively. Then 

-{SHi), {SHi) + {SH2), and (SHi) x {SH2) 

are subword histories with respective values 

— ai, ai + a2, and aia2. 

The notation \w\u is now further generalized as |w|5h for a subword history SH 
to denote the value of SH in w. 

For a non- negative integer e > and a subword history SH, we denote 
e times 

SH X SH X ■■■ X SH by J]^ • For instance, SH = SH x SH and 
SH = SHxSHx SH. Let us set J]" S'iJ be A for any subword history SH. 
In light of the next proposition, this setting does not contradict the convention 
that \w\\ — 1 for any word w. 

Proposition 1. Let SH be a subword history in S with value a, c be an integer, 
and e be a non-negative integer. Then c{SH) and J^"^ SH are subword histories 
with respective values ca and . 

Two subwords SH\ and SH2 are equivalent if Iwjsi^j — \w\sh.2 for every 
word w G E*. It is not difficult to observe that the subword histories axb and 
ab+ba assume the same value in any word (see 0)- These two subword histories 
are hence equivalent. A subword history is linear if it is obtained without using 
the operation x . We say that a linear subword history is of degree n if its longest 
monomial is of length n. For instance, the degree of abb + 2c + 3 is 3 due to its 
first term. More generally, we can define the degree of a subword history as the 
minimum degree of equivalent linear subword histories. 

Mateescu, Salomaa, and Yu proposed a method of constructing from a 
given subword history an equivalent linear subword history, and as its corol- 
lary, the problem of deciding the equivalence of two given subword histories 
turned out to be decidable [?]■ In the paper, the authors called for a contin- 
uation of research on inequalities between subword histories. Specifically, they 
left the following problem open: for a given subword history SH, is it decidable 
whether |w|5^f > holds for every word w in E*. Let us call this problem 
SubwordlneqAbsoluteness. From the point of view of decidability, it is irrel- 
evant whether this problem is formalized with > or with >. Indeed, deciding 
whether 1w|sh > holds for every word w S E* is equivalent to deciding 
whether \w\sh-x > 0; note that SH — A is a valid subword history with value 

\w\sH - 1- 
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2 Main results 



In this section, we prove that SubwordlneqAbsoluteness is undecidable even 
under strong restrictions (Corohary [2]). This is our main contribution in this 
paper. 

First of all, we show that this problem is at least as hard as the problem 
of deciding for given two subword histories SHi and SH2 whether there exists 
a word w € T,* such that \w\shi = \'w\sh2 holds. Let us call the latter prob- 
lem SubwordEqSolvability. The reader can consult [2]. if needs arise, on 
undecidability, polynomial-time Karp reduction, and NP-hardness. 

Lemma 1. SubwordEqSolvability is polynomial-time Karp reducible to SubwordlneqAbsoluteness. 

Proof. Assume that as an instance of SubwordEqSolvability two subword his- 
tories SHi and SH2 are given. Let SH = SHi — SH2. Then, the answer to 
this instance is no if and only if \w\shxsh > for every word w £ S*. Note 
that SH X SH is a valid subword history (Proposition [J), and its value in w is 

i\w\sH?. □ 

In order to prove the undecidability of SubwordlneqAbsoluteness, there- 
fore, it suffices to prove that SubwordEqSolvability is undecidable. 

Theorem 1. SubwordEqSolvability is undecidable. 

Proof. This proof is based on the unsolvability of Diophantine equation proved 
by Matiyasevich in 8 , the answer to the Hilbert's tenth problem. Let a Dio- 
phantine equation 

CiXi X2 ■ ■ ■ — U 

l<i<l 

be given, where £ > 1, ci, . . . , q are integer constants, xi, 0:2, . . . , Xm are positive 
integer variables, and ei,i, 6^,2, ■ • ■ , ei,™ are non-negative integer exponents for 
\ < i < L (It is well known that we can restrict the attention to positive integer 
variables, see [TT].) 

Let E = {ai, . . . , am}. Consider a word w in a^^ujaj^uj • • • Luam™ for some 
non- negative integers rii, . . . , n,„. Then for 1 < j < m, we have 



Proposition [T] implies that Y^' ' Oj is a subword history for any 1 < i < £, and 
its value in w is nj''^ . Using the proposition once again, we see that 

JJaix]^a2X---x J^a 

is a subword history whose value in w is Cin^'"^ 71.2''^ • • • rim'" ■ Let us denote this 
subword history by SHi, and let SH = J2i<i<e ^^i^ which is also a subword 
history. Now it should be clear that 

\w\sH ^ c.n^-^n"^-'' ■■■n"^^. (1) 

i<i<i 
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This is the value we can obtain by substituting (ni, . . . , n„i) into the given Dio- 
phantine equation. Therefore, if the Diophantine equation has a positive integer 
solution (ni, 712, • ■ • , nm), then for such w, \w\sh = 0. Conversely, assume that 
there exists a word u e S* such that \v\sh = 0. According to Definition [T] and 
Eq. (HD, 

i<i<e 

Since this value is 0, (Ivjaj, jiija^, . . . , |u|a,„) is a positive integer solution to 
the given Diophantine equation. Consequently, if SubwordEqSolvability were 
decidable, then we would be able to determine the solvability of the given Dio- 
phantine equation, a contradiction. □ 

Corollary 1. SubwordlneqAbsoluteness is undecidable. 

As being mentioned previously, the equivalence of two subword histories is 
decidable, and note that this does not contradict Corollary [TJ 

It is worth observing that in the proof of Theorem [TJ we reduce a given 
Diophantine equation into an element of a restricted class of subword histories, 
which we call the class of letter-restricted subword histories. The definition of 
letter-restricted subword history is obtained by restricting that monomials be 
letters in S or A in Definition [TJ 

It is well known that the Diophantine equations remain unsolvable even if 
the number of variables involved is limited to be 9 [5]. In the proof of The- 
orem [TJ the number of variables equals that of letters in S. Thus, over an 
alphabet of 9 letters, SubwordEqSolvability is undecidable, and hence, so is 
SubwordlneqAbsoluteness. Combining this with what was mentioned in the 
last paragraph, now we present our strongest result on the undecidability of 
these problems as of this moment. 

Theorem 2. If the Diophantine equations are unsolvable over n variables, then 
SubwordEqSolvability and SubwordlneqAbsoluteness are unsolvable even 
for the class of letter-restricted subword histories over an alphabet of n letters. 

Corollary 2. SubwordEqSolvability and SubwordlneqAbsoluteness are un- 
decidable even for the class of letter-restricted subword histories over a nonary 
alphabet. 

Corollary[2jdoes not mean that SubwordEqSolvability or SubwordlneqAbsoluteness 

is decidable over an alphabet of size at most 8. It is conjectured that Diophan- 
tine equations remain unsolvable even over three variables. If so, then Theo- 
rem [2l implies that these problems would be undecidable even for the class of 
letter-restricted subword histories over a ternary alphabet. 

How small do we have to make the size of alphabet to make these problems 
decidable? We cannot help but leave this matter unsettled in this paper, but can 
provide a result to illustrate how hard SubwordlneqAbsoluteness is. Manders 
proved that it is NP-complete to decide the solvability of a given Diophantine 
equation of the form cix'^ + C2j/ -I- C3 = [6]. Our construction of a subword 
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history from a given Diophantine equation in the proof of Theorem[T]can be done 
in a polynomial time. In addition, the subword history thus constructed can be 
transformed in a polynomial time into a linear subword history cia + 2ciaa + 
C26 + C3 of degree 2 by the above-mentioned product elimination by Mateescu, 
Salomaa, and Yu, where the letters a and b correspond to the variables x and 
y, respectively. With Lemma [TJ we can prove the following theorem, though it 
does not settle the question at the beginning of this paragraph. 

Theorem 3. SubwordEqSolvability and SubwordlneqAbsoluteness are NP- 

hard even for the class of letter-restricted subword histories of degree 2 over a 
binary alphabet. 

3 System of Diophantine equations 

In this section, we glance at the polynomial-time Karp reduction from a given 
system of Diophantine equations to a subword inequality. The reduction should 
be in itself trivial from our proof of Theorem [TJ but let us spend some space for 
this because of an implication it has on a significant problem called preimage 
problem. 

A system of Diophantine equations is a finite collection (Eqi, Eq2, . . . , Eq^) 
of Diophantine equations. Using our method, the equations Eqi, . . . , Eq^ are 
transformed into the respective subword histories SHi , . . . , SHk . From them, 
we construct the following subword history: 

SH := H {{SH, X SHi) + 1). 

l<i<k 

Then, for w € E*, Iwlsff = 1 if and only if for all 1 < i < A:, |w|5ij; — 0. Since 
SH always assumes a positive integer value, deciding whether Iwls// = 1 can 
be done both by equation and by inequality. 

Given a subword history SH and a word tu € S*, it is a pen-and-paper 
calculation to obtain the value of SH in w, and it remains the case no matter how 
many subword histories are given. From the subword histories SHi , . . . , SH^ 
and the values ni, . . . , rifc thus calculated from w, we can build the following 
system of subword equations: 

SHi = ni 
^ SHk = rik 

and after that, we hide w. Can we find w, or more desirably, can we eliminate 
the candidates of w? In Section [1] an example was cited from [7] to see the 
uniqueness of the word w G {a, 6}* satisfying \w\a = {wlb = 3 and \w\ab = 8. 
This is interpreted in the above-mentioned framework as finding w when (3, 3, 8) 
is given (assume that we know to what subword history each coordinate is 
related in this vector). Problems of this type are collectively termed preimage 
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problems (see, e.g., [HIS] for a preimage problem in chemoinformatics) . Preimage 
problems can be formalized not only for words but for various objects like graphs 
so long as some of their properties can be quantified. What we mentioned in 
the previous paragraph, however, demonstrates how computationally-hard the 
preimage problem is even for words. One reason for this hardness is that in 
counting occurrences of a subword, search range covers the whole of a given 
word (global scope) in our current formalization. As such, if we reformulated 
problems so as to confine the search range, then the reformulated preimage 
problem could be solved even efficiently. In [I], Akutsu and Fukagawa counted 
only the occurrences of words as a continuous subword, and showed that in this 
setting, the preimage problem can be solved in a polynomial time. 

4 Concluding remarks, discussions, and future 
directions 

In this paper, it was proved to be undecidable whether there exists a word 
in which an equation between given two subword histories holds. With the 
polynomial-time Karp reduction, this amounted to the answer to the open prob- 
lem by Mateescu, Salomaa, and Yu posed in [7]. This problem was proved to 
remain undecidable even under conditions on the size of alphabet, on the class 
of subword histories considered, and on the length of monomials involved. As 
such, our main results are stronger than a solution to the original open problem. 

Results in this paper are oriented toward unsolvability, and therefore, cannot 
be employed to make use of a number of known decidability results on the solv- 
ability of Diophantine equations (see This motivates us to make a research 
on the characterization of subword histories that is polynomial-time Karp re- 
ducible to a Diophantine equation whose solvability is decidable. It might be 
worth recalling that the Diophantine equations are reduced to the very restricted 
class SH of letter-restricted subword histories. Thus, for any class of subword 
histories that does not contain SH as a subset, it remains unknown whether 
SubwordEqSolvability or SubwordlneqAbsoluteness is decidable. The most 
significant difference between Diophantine equations and equations on subword 
histories is that the latter is defined on the group which is not Abelian. In this 
paper, this difference has been barely encountered because our attention was 
mainly on the class of letter-restricted subword histories, in which commuta- 
tivity does not count so much. This observation gives us an impression that 
combinatorics on words will play an important role in working on the above- 
mentioned problems (see [4] and the references therein). 
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